[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture by chaunceyjiang · Pull Request #32240 · vllm-project/vllm

chaunceyjiang · 2026-01-13T07:00:55Z

Purpose

refactors the OpenAI chat_completion_serving architecture,

split vllm/entrypoints/openai/protocol.py
TODO
[ ] completion_serving
[ ] responses_serving
[ ] transcription_serving
[ ] tests re-org
[ ] compatibility with the previous import of vllm/entrypoints/openai/protocol.py

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

gemini-code-assist

Code Review

This pull request refactors the OpenAI serving architecture by restructuring files and updating import paths. The changes are mostly mechanical, but I found a couple of critical issues in the newly added vllm/entrypoints/openai/chat_completion/protocol.py file: a syntax error in an import statement and a missing import for FunctionDefinition. These issues will prevent the code from running and need to be addressed.

vllm/entrypoints/openai/chat_completion/protocol.py

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

vllm/entrypoints/openai/chat_completion/api_router.py

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

DarkLight1337

LGTM as long as tests pass

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

…architecture (vllm-project#32240) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

…architecture (vllm-project#32240) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…architecture (vllm-project#32240) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

- Branch OpenAI entrypoint imports on vllm version: vllm > 0.14.1 uses the reorganized paths introduced in vllm-project/vllm#32240 - Add ty: ignore[unresolved-import] suppressions for version-gated imports that may not exist in the installed vllm - Matrix py-check CI job across vllm 0.14.1 and 0.15.1 - Fix is_reasoning_end signature: list[int] -> Sequence[int] to match the abstract base class

## Description This PR aims to support vLLM v0.15.1 or newer verions. To do this, we introduce a conditional import logic at the top of `parser.py` ## Related Issue  vllm-project/vllm#32240 introduces new structure, which is a breaking change for melody ## Motivation and Context  Melody does not support vLLM v0.15 or newer verions. ## How Has This Been Tested?   ### Check1: Confirm that import works with both vLLM versions ``` $ uv pip list | grep vllm vllm 0.15.1 $ uv run cohere_melody_vllm/parser.py # no error ``` ``` $ uv pip list | grep vllm vllm 0.14.1 $ uv run cohere_melody_vllm/parser.py # no error ``` ### Check2: Tool Calling works with vLLM v0.15.1 Start server ``` uv run vllm serve CohereLabs/c4ai-command-r7b-12-2024 --reasoning-parser cohere2 --reasoning-parser-plugin ./cohere_melody_vllm/parser.py --tool-parser-plugin ./cohere_melody_vllm/parser.py --tool-call-parser cohere2 --enable-auto-tool-choice ``` and then send a tool calling query ``` $ uv run tool.py ChatCompletion(id='chatcmpl-a8a2b2e52a4dc558', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-a9f3c1ba457d277f', function=Function(arguments='{"location": "San Francisco, California", "unit": "celsius"}', name='get_weather'), type='function')], reasoning='I will use the get_weather tool to find out the weather in San Francisco, California in Celsius.', reasoning_content='I will use the get_weather tool to find out the weather in San Francisco, California in Celsius.'), stop_reason=None, token_ids=None)], created=1776133545, model='CohereLabs/c4ai-command-r7b-12-2024', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=69, prompt_tokens=1302, total_tokens=1371, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None) 'Function called: get_weather' 'Arguments: {"location": "San Francisco, California", "unit": "celsius"}' 'Result: Getting the weather for San Francisco, California in celsius...' ```  --- > [!NOTE] > **Medium Risk** > Adds version-dependent imports for vLLM OpenAI protocol types, so a mistake in version detection or module paths could cause runtime import failures across supported vLLM versions. > > **Overview** > Adds vLLM version-aware import logic in `cohere_melody_vllm/parser.py` to handle the OpenAI entrypoint protocol module reorganization introduced after vLLM `0.14.1`, enabling the plugin to run against both old and new layouts. > > Updates the Python bindings CI `py-check` job to run `ty check` in a matrix against vLLM `0.14.1` and `0.15.1` to continuously validate compatibility. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit e3aa4f8. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>

[Refactor] [6/N] to simplify the vLLM openai serving architecture

a48a6ea

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

mergify bot added deepseek Related to DeepSeek models frontend llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models labels Jan 13, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Jan 13, 2026

mergify bot added v1 tool-calling labels Jan 13, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Jan 13, 2026

github-project-automation bot added this to Tool Calling Jan 13, 2026

chaunceyjiang changed the title ~~[Refactor] [6/N] to simplify the vLLM openai serving architecture~~ [Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture Jan 13, 2026

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

vllm/entrypoints/openai/chat_completion/protocol.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/chat_completion/protocol.py Show resolved Hide resolved

[Refactor] [6/N] to simplify the vLLM openai serving architecture

fe3771b

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

mergify bot added the multi-modality Related to multi-modality (#4194) label Jan 13, 2026

chaunceyjiang added 6 commits January 13, 2026 16:26

[Refactor] [6/N] to simplify the vLLM openai serving architecture

90fbafe

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

[Refactor] [6/N] to simplify the vLLM openai serving architecture

ce634ea

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

[Refactor] [6/N] to simplify the vLLM openai serving architecture

cdc0306

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

[Refactor] [6/N] to simplify the vLLM openai serving architecture

a44ce33

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

[Refactor] [6/N] to simplify the vLLM openai serving architecture

6810609

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

[Refactor] [6/N] to simplify the vLLM openai serving architecture

e228a93

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang marked this pull request as ready for review January 13, 2026 09:17

chaunceyjiang requested review from DarkLight1337, NickLucche, aarnphm, noooop and robertgshaw2-redhat as code owners January 13, 2026 09:17

cursor bot reviewed Jan 13, 2026

View reviewed changes

vllm/entrypoints/openai/chat_completion/api_router.py Show resolved Hide resolved

[Refactor] [6/N] to simplify the vLLM openai serving architecture

d9534eb

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 13, 2026

DarkLight1337 approved these changes Jan 13, 2026

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Jan 13, 2026

chaunceyjiang added 2 commits January 13, 2026 18:12

[Refactor] [6/N] to simplify the vLLM openai serving architecture

47f24be

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

[Refactor] [6/N] to simplify the vLLM openai serving architecture

5a314e9

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang enabled auto-merge (squash) January 13, 2026 11:14

chaunceyjiang disabled auto-merge January 13, 2026 11:14

[Refactor] [6/N] to simplify the vLLM openai serving architecture

93b0678

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang enabled auto-merge (squash) January 13, 2026 11:18

chaunceyjiang merged commit fefce49 into vllm-project:main Jan 13, 2026
50 checks passed

github-project-automation bot moved this to Done in Tool Calling Jan 13, 2026

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Jan 13, 2026

chaunceyjiang deleted the vllm_open_refactor branch January 13, 2026 13:06

sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026

[Refactor] [6/N] to simplify the vLLM openai chat_completion serving …

058276f

…architecture (vllm-project#32240) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Refactor] [6/N] to simplify the vLLM openai chat_completion serving …

75c2729

…architecture (vllm-project#32240) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

jeffreywang-anyscale mentioned this pull request Jan 31, 2026

[deps][LLM] Upgrade vLLM to 0.15.0 ray-project/ray#60253

Closed

6 tasks

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Refactor] [6/N] to simplify the vLLM openai chat_completion serving …

e9f428b

…architecture (vllm-project#32240) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

shun-cohere mentioned this pull request Apr 14, 2026

feat: Support vLLM v0.15.1 or newer cohere-ai/melody#94

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture#32240

[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture#32240
chaunceyjiang merged 12 commits intovllm-project:mainfrom
chaunceyjiang:vllm_open_refactor

chaunceyjiang commented Jan 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chaunceyjiang commented Jan 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaunceyjiang commented Jan 13, 2026 •

edited by github-actions bot

Loading